-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow to read non-standard CSV #326
Conversation
split into build_csv_reader, from_csv_reader add escape, quote, terminator arg to build_csv_reader
schema inference support for non-standard CSV add fn infer_file_schema_with_csv_options add fn infer_reader_schema_with_csv_options ReaderBuilder support for non-standard CSV add escape, quote, terminator field add fn with_escape, with_quote, with_terminator change ReaderBuilder::build for non-standard CSV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution @kazuk
This code looks really nice and clean. 👍
The only thing that I think is missing from this PR is some basic tests -- specifically to show everything hooked together properly and the options get to the csv reader.
I think such tests would be of most value to ensure that as we change this code in the future, we don't accidentally break this new functionality. Give we are delegating to the implementation in the csv reader crate I don't think we need to test a large number of corner cases -- just that we can read a CSV file with a non default escape, quote and terminator character
Thanks again!
Codecov Report
@@ Coverage Diff @@
## master #326 +/- ##
==========================================
- Coverage 82.52% 82.50% -0.03%
==========================================
Files 162 162
Lines 44007 44036 +29
==========================================
+ Hits 36316 36331 +15
- Misses 7691 7705 +14
Continue to review full report at Codecov.
|
Thank you for review.
I add a basic tests for detects feature broken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thank you @kazuk
The MIRI CI check is not related to this PR (edit #345) However, the lint failure seems to be: https://github.com/apache/arrow-rs/pull/326/checks?check_run_id=2658268180 I think it can be resolved by running |
oh!, Thanks. I run |
The MIRI failure is unrelated to this PR: #345 |
Thanks again @kazuk |
* refactor Reader::from_reader split into build_csv_reader, from_csv_reader add escape, quote, terminator arg to build_csv_reader * add escape,quote,terminator field to ReaderBuilder schema inference support for non-standard CSV add fn infer_file_schema_with_csv_options add fn infer_reader_schema_with_csv_options ReaderBuilder support for non-standard CSV add escape, quote, terminator field add fn with_escape, with_quote, with_terminator change ReaderBuilder::build for non-standard CSV * minimize API change * add tests add #[test] fn test_non_std_quote add #[test] fn test_non_std_escape add #[test] fn test_non_std_terminator * apply cargo fmt
* refactor Reader::from_reader split into build_csv_reader, from_csv_reader add escape, quote, terminator arg to build_csv_reader * add escape,quote,terminator field to ReaderBuilder schema inference support for non-standard CSV add fn infer_file_schema_with_csv_options add fn infer_reader_schema_with_csv_options ReaderBuilder support for non-standard CSV add escape, quote, terminator field add fn with_escape, with_quote, with_terminator change ReaderBuilder::build for non-standard CSV * minimize API change * add tests add #[test] fn test_non_std_quote add #[test] fn test_non_std_escape add #[test] fn test_non_std_terminator * apply cargo fmt Co-authored-by: kazuhiko kikuchi <[email protected]>
Which issue does this PR close?
Closes #315.
Rationale for this change
What changes are included in this PR?
Reader::from_reader
split intoReader::build_csv_reader
,Reader::from_csv_reader
.Reader::build_csv_reader
buildscsv_crate::Reader
,with all CSV reader options.Rader::from_csv_reader
buildsarrow::csv::reader::Reader
forcsv_crate::Reader
.add fn
infer_file_schema_with_csv_options
.change
infer_file_schema
callsinfer_file_schema_with_csv_options
with default options.add fn
infer_reader_schema_with_csv_options
.change
infer_reader_schema
callsinfer_reader_schema
with default options.add
escape
quote
terminator
toReaderBuilder
ReaderBuilder::build
uses added options.Are there any user-facing changes?
currently minimized API change.
ReaderBuilder::escape
ReaderBuilder::quote
ReaderBuilder::terminator
please concider make public for fn
infer_file_schema_with_csv_options
,infer_reader_schema_with_csv_options
,Reader::build_csv_reader
,Reader::from_csv_reader
.